Visualisations of species distributions can be very simple yet effective ways of conveying biological and ecological information, such as range, habitat, population size, fragmentation, and migration. This information can be represented as points, hulls, densities, and binned regions, to name but a few.
DAX: we could link to labs posts that show these methods?
Representing more than one species distribution in a single figure can be difficult, though, especially where there are areas of overlap. Points and hulls will obscure each other even with a degree of transparency, while densities and shaded regions can only show one species at a time. Comparisons can be made by providing two single distribution figures side-by-side, but it can be inconvenient to observe range overlaps in this manner.
Here, we demonstrate a method to visualise distributions of multiple species with overlapping ranges on the same map, with only a small loss in resolution. The technique is a novel twist on the commonly used hexbin map: instead of using a colour fill to represent presence/absence or counts within each hexagon, we use multiple coloured points within each hexagon to represent presence/absence of species, allowing users to get a broad overview of how multiple species are distributed across an area.
ADD A SECTION HERE EXPLAINING WHAT THE STEPS ARE GOING TO BE e.g. We’ll do this by: - download records for species of interest
- create a hex grid for the area of interest
- set up points within hexes, assign species to points
- combine basemap, hex grid, points into single figure
Let’s begin by loading the R packages we’ll be using.
library(galah)
library(ozmaps)
library(sf)
library(tidyverse)
library(stringr)We’ll use the {galah} package to download occurrence records from the Atlas of Living Australia (ALA). To do this, you’ll need to register your email address with the ALA, then pass it to {galah} using galah_config().
CALLUM: would you prefer to use your work email address?
galah_config(email = "your-email@email.com")Download data
Since our goal here is to map distributions of multiple species, we’ve chosen honeyeaters from the genus Melithreptus: this is a distinctive group of 7 small- to medium-sized, short-billed and square-tailed honeyeaters with overlapping distributions across Australia.
We can get taxonomic information about this group using atlas_species()…
melithreptus <- galah_call() |>
galah_identify("Melithreptus") |>
atlas_species()
melithreptus# A tibble: 7 × 10
kingdom phylum class order family genus species author species_guid
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Animalia Chordata Aves Passeriformes Melip… Meli… Melith… (Viei… https://bio…
2 Animalia Chordata Aves Passeriformes Melip… Meli… Melith… (Vigo… https://bio…
3 Animalia Chordata Aves Passeriformes Melip… Meli… Melith… Gould… https://bio…
4 Animalia Chordata Aves Passeriformes Melip… Meli… Melith… (Goul… https://bio…
5 Animalia Chordata Aves Passeriformes Melip… Meli… Melith… (Less… https://bio…
6 Animalia Chordata Aves Passeriformes Melip… Meli… Melith… (Goul… https://bio…
7 Animalia Chordata Aves Passeriformes Melip… Meli… Melith… Gould… https://bio…
# ℹ 1 more variable: vernacular_name <chr>
… and then use this information to download occurrence records for the 7 species. We’ll apply the general data quality profile to filter out low quality records, pass in the list of species we’re interested in with galah_identify(), filter records to 2022 1 that also have spatial coordinates, and fall within one of the IBRA bioregions as a proxy for Australian records only.
DAX: not sure what the convention is here - do we store downloaded data as RDS files and load them instead of downloading each time? Also I can’t remember where they get saved
species_occ <- galah_call() |>
galah_apply_profile(ALA) |>
galah_identify(melithreptus$species) |>
galah_filter(year == 2022,
!is.na(cl1048), # IBRA bioregions
!is.na(decimalLatitude),
!is.na(decimalLongitude)) |>
galah_select(decimalLatitude,
decimalLongitude,
species,
vernacularName,
scientificName) |>
atlas_occurrences()
head(species_occ)# A tibble: 6 × 5
decimalLatitude decimalLongitude species vernacularName scientificName
<dbl> <dbl> <chr> <chr> <chr>
1 -43.6 147. Melithreptus v… Strong-billed… Melithreptus …
2 -43.6 147. Melithreptus v… Strong-billed… Melithreptus …
3 -43.5 146. Melithreptus v… Strong-billed… Melithreptus …
4 -43.5 147. Melithreptus a… Black-headed … Melithreptus …
5 -43.5 147. Melithreptus a… Black-headed … Melithreptus …
6 -43.5 147. Melithreptus v… Strong-billed… Melithreptus …
Since we’re going to be performing a few spatial operations to get species in hexagons, we’ll also convert the species_occ dataframe into a simple features object, with the latitude and longitude columns represented as points in a geometry column named occ_geometry.
species_occ_sf <- species_occ |>
st_as_sf(coords = c("decimalLongitude", "decimalLatitude"),
crs = 4326) |>
st_set_geometry("occ_geometry")Generate hex grid
Next, we’ll set up a grid of hexagons across terrestrial Australia, which we’ll use as bins for plotting summaries of species occurrence. st_make_grid() has arguments for specifying the size, type, and orientation of polygons in the grid, and this is initially created across the entire bounding box of the supplied shapefile (here the ozmap_country shapefile). We can then use a spatial filter to exclude any hexagons that do not intersect with the shapefile polygon boundary, and assign a unique identifier to each remaining hexagon.
hex_grid <- st_make_grid(ozmap_country,
cellsize = 2,
what = "polygons",
square = FALSE,
flat_topped = TRUE) |>
st_as_sf() |>
st_filter(ozmap_country) |>
st_set_geometry("hex_geometry") |>
st_transform(4326) |>
rowid_to_column(var = "hex_id")Our grid of hexagons now looks like this:
Remove empty hexes
Not every hexagon will contain an occurrence record, though, so let’s remove empty hexagons with a spatial join (which behaves similarly to dplyr::left_join() for spatial objects). This returns an object with the same number of rows as our original download of occurrence records, with additional columns containing information on which polygon each species occurs in.
hex_with_species <- st_join(x = hex_grid,
y = species_occ_sf,
join = st_intersects,
left = FALSE)
head(hex_with_species, n = 10)Simple feature collection with 10 features and 4 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 145.9652 ymin: -44.63203 xmax: 148.2746 ymax: -42.63203
Geodetic CRS: WGS 84
hex_id species vernacularName
1 1 Melithreptus validirostris Strong-billed Honeyeater
1.1 1 Melithreptus validirostris Strong-billed Honeyeater
1.2 1 Melithreptus validirostris Strong-billed Honeyeater
1.3 1 Melithreptus affinis Black-headed Honeyeater
1.4 1 Melithreptus affinis Black-headed Honeyeater
1.5 1 Melithreptus validirostris Strong-billed Honeyeater
1.6 1 Melithreptus validirostris Strong-billed Honeyeater
1.7 1 Melithreptus validirostris Strong-billed Honeyeater
1.8 1 Melithreptus validirostris Strong-billed Honeyeater
1.9 1 Melithreptus validirostris Strong-billed Honeyeater
scientificName hex_geometry
1 Melithreptus (Eidopsarus) validirostris POLYGON ((145.9652 -43.6320...
1.1 Melithreptus (Eidopsarus) validirostris POLYGON ((145.9652 -43.6320...
1.2 Melithreptus (Eidopsarus) validirostris POLYGON ((145.9652 -43.6320...
1.3 Melithreptus (Melithreptus) affinis POLYGON ((145.9652 -43.6320...
1.4 Melithreptus (Melithreptus) affinis POLYGON ((145.9652 -43.6320...
1.5 Melithreptus (Eidopsarus) validirostris POLYGON ((145.9652 -43.6320...
1.6 Melithreptus (Eidopsarus) validirostris POLYGON ((145.9652 -43.6320...
1.7 Melithreptus (Eidopsarus) validirostris POLYGON ((145.9652 -43.6320...
1.8 Melithreptus (Eidopsarus) validirostris POLYGON ((145.9652 -43.6320...
1.9 Melithreptus (Eidopsarus) validirostris POLYGON ((145.9652 -43.6320...
This means any hexagons we initially created in the grid that don’t intersect with occurrence records have been removed:
Set up positions
Since some hexagons will contain occurrence records for more than one species, we need a way to display these overlaps. We’ll do this by setting up 7 positions in each hexagon, 1 for each species, and assign each species a position and colour so they can be visually differentiated.
The figure below summarises how we’ll do this: for each hexagon remaining in the grid, we’ll generate a smaller hexagon, then get the coordinates of each vertex and centroid of the smaller hexagon.
Callum: My brain is not cooperating so I can’t actually visualise this but I trust it’s trueLet’s start by extracting the unique identifiers and spatial coordinates for every hexagon containing an occurrence record2.
Callum/Dax: I was struggling to explain the unexpected choice of functions used here, and am not sure if the explanation actually clarifies anything
unique_hex <- hex_with_species |>
count(hex_id, hex_geometry) |>
select(-`n`)We can then create a smaller hexagon within each original hexagon in the grid using st_buffer(), extract the coordinates of each vertex using st_coordinates(), and assign an integer to each vertex ranging from 1 to 7. Using pmap() here allows us to do this iteratively for every hexagon in the grid.
Some points to note: the size of the smaller hexagon is set using the dist argument in st_buffer(), and this is in turn dependent on the cellsize of the original hexagon we created (in the example above, we set cellsize = 2). Depending on the number of species you’d like to fit within each polygon and the shape of the polygon you’ve chosen, you may need to try out different values of cellsize and dist to find combinations that work best for your visualisation.
The resulting dataframe has a geometry column with spatial coordinates for each of the 7 vertices associated with the smaller hexagons we generated.
vertex_coords <- unique_hex |>
mutate(vertices = pmap(
.l = list(x = hex_geometry),
.f = function(x) {
x |>
st_buffer(dist = -0.3) |>
st_coordinates() |>
as_tibble() |>
st_as_sf(coords = c("X", "Y")) |>
select(-L1,-L2) |>
mutate(vertex_position = 1:7)
})) |>
unnest(cols = vertices)
head(vertex_coords, n = 10)Simple feature collection with 10 features and 2 fields
Active geometry column: hex_geometry
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 144.2331 ymin: -44.63203 xmax: 148.2746 ymax: -41.63203
Geodetic CRS: WGS 84
# A tibble: 10 × 4
hex_id hex_geometry geometry vertex_position
<int> <POLYGON [°]> <POINT> <int>
1 1 ((145.9652 -43.63203, 146.5… (146.3116 -43.63203) 1
2 1 ((145.9652 -43.63203, 146.5… (146.7157 -42.93203) 2
3 1 ((145.9652 -43.63203, 146.5… (147.524 -42.93203) 3
4 1 ((145.9652 -43.63203, 146.5… (147.9282 -43.63203) 4
5 1 ((145.9652 -43.63203, 146.5… (147.524 -44.33203) 5
6 1 ((145.9652 -43.63203, 146.5… (146.7157 -44.33203) 6
7 1 ((145.9652 -43.63203, 146.5… (146.3116 -43.63203) 7
8 2 ((144.2331 -42.63203, 144.8… (144.5795 -42.63203) 1
9 2 ((144.2331 -42.63203, 144.8… (144.9837 -41.93203) 2
10 2 ((144.2331 -42.63203, 144.8… (145.792 -41.93203) 3
Since we need 7 positions in each hexagon (to fit a maximum of 7 species), and we conveniently have a duplicated row for every hexagon, we can mutate the repeated 7th vertex to hold the coordinates of the centroid of each hexagon.
vertex_centroid_coords <- vertex_coords |>
mutate(geometry = ifelse(vertex_position == 7,
st_centroid(hex_geometry),
geometry)) |>
st_drop_geometry() |>
st_as_sf(crs = 4326)Assign species to positions
All we have to do now is assign a position to each species… Callum:I ended up keeping the horrible regex because it makes it easier to join in the next step- feel free to add more text to make this clearer or change the code if you can think of something better
species_data <- melithreptus |>
select(species, vernacular_name) |>
mutate(species = str_replace_all(species, "\\s*\\(.*?\\)\\s*", " "),
vernacular_name = if_else(species == "Melithreptus chloropsis",
"Gilbert's Honeyeater",
vernacular_name),
position = c(1:7))… and join that to the dataframe telling us which species occur in which hexagon.
species_points <- hex_with_species |>
st_drop_geometry() |>
select(hex_id, species) |>
distinct() |>
left_join(species_data,
by = join_by(species)) |>
left_join(vertex_centroid_coords,
by = join_by(position == vertex_position, hex_id == hex_id)) |>
st_as_sf(crs = 4326)Map
# This is colour-blind safe so using it as a placeholder for now
# but it's not the nicest palette
okabe <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
ggplot() +
geom_sf(data = ozmap_states, fill = NA, colour = "#ababab", linewidth = 0.3) +
geom_sf(data = unique_hex, fill = "#efefef55", colour = "#777777", linewidth = 0.5) +
geom_sf(data = species_points, aes(colour = species)) +
scale_colour_manual(values = okabe) +
lims(x = c(112, 155), y = c(-46, -8)) +
theme_void()
Expand for session info
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31)
os macOS Sonoma 14.1.2
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Australia/Melbourne
date 2023-12-18
pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.1)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)
galah * 2.0.0 2023-11-20 [1] CRAN (R 4.3.1)
ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1)
htmltools * 0.5.7 2023-11-03 [1] CRAN (R 4.3.1)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.1)
ozmaps * 0.4.5 2021-08-03 [1] CRAN (R 4.3.0)
patchwork * 1.1.3 2023-08-14 [1] CRAN (R 4.3.0)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
readr * 2.1.4 2023-02-10 [1] CRAN (R 4.3.0)
sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
sf * 1.0-14 2023-07-11 [1] CRAN (R 4.3.0)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.1)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)
[1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────
Footnotes
There are over half a million records from this genus in the ALA, so restricting our download to records from 2022 significantly speeds things up!↩︎
Using
distinct()produces an identical result, butcount()is much faster because checking for distinct values in the geometry column is computationally intensive. If your dataframe has fewer rows, you could also do this:hex_with_species |> select(hex_id, hex_geometry) |> distinct().↩︎
